UPSTREAM PR #19286: completion : simplify batch (embd) processing#1151
UPSTREAM PR #19286: completion : simplify batch (embd) processing#1151
Conversation
This commit simplifies the processing of embd by removing the for loop that currently exists which uses params.n_batch as its increment. This commit also removes the clamping of n_eval as the size of embd is always at most the size of params.n_batch. The motivation is to clarify the code as it is currently a little confusing when looking at this for loop in isolation and thinking that it can process multiple batches.
|
No meaningful performance changes were detected across 115468 analyzed functions in the following binaries: build.bin.llama-cvector-generator, build.bin.llama-tts, build.bin.libllama.so, build.bin.libmtmd.so, build.bin.llama-bench, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli, build.bin.llama-quantize, build.bin.llama-gemma3-cli, build.bin.libggml.so, build.bin.libggml-base.so, build.bin.libggml-cpu.so, build.bin.llama-tokenize, build.bin.llama-qwen2vl-cli. 🔎 Full breakdown: Loci Inspector. |
|
No meaningful performance changes were detected across 115468 analyzed functions in the following binaries: build.bin.llama-cvector-generator, build.bin.llama-tts, build.bin.libllama.so, build.bin.libmtmd.so, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli, build.bin.llama-quantize, build.bin.llama-gemma3-cli, build.bin.llama-tokenize, build.bin.llama-qwen2vl-cli, build.bin.libggml-base.so, build.bin.libggml-cpu.so, build.bin.libggml.so, build.bin.llama-bench. 🔎 Full breakdown: Loci Inspector. |
823244c to
bab7d39
Compare
a92fe2a to
6495042
Compare
5ac00d6 to
998dd7a
Compare
8c39ead to
418d9f2
Compare
f6c9b75 to
6c480d8
Compare
Note
Source pull request: ggml-org/llama.cpp#19286
This commit simplifies the processing of embd by removing the for loop that currently exists which uses params.n_batch as its increment. This commit also removes the clamping of n_eval as the size of embd is always at most the size of params.n_batch.
The motivation is to clarify the code as it is currently a little confusing when looking at this for loop in isolation and thinking that it can process multiple batches.